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Abstract — The  recently-proposed  theory  of  distilled  sensing 
establishes  that  adaptivity  in  sampling  can  dramatically  improve 
the  performance  of  sparse  recovery  in  noisy  settings.  In  par¬ 
ticular,  it  is  now  known  that  adaptive  point  sampling  enables 
the  detection  and/or  support  recovery  of  sparse  signals  that  are 
otherwise  too  weak  to  be  recovered  using  any  method  based  on 
non-adaptive  point  sampling.  In  this  paper  the  theory  of  dis¬ 
tilled  sensing  is  extended  to  highly-undersampled  regimes,  as  in 
compressive  sensing.  A  simple  adaptive  sampling-and-refinement 
procedure  called  compressive  distilled  sensing  is  proposed,  where 
each  step  of  the  procedure  utilizes  information  from  previous 
observations  to  focus  subsequent  measurements  into  the  proper 
signal  subspace,  resulting  in  a  significant  improvement  in  effective 
measurement  SNR  on  the  signal  subspace.  As  a  result,  for  the 
same  budget  of  sensing  resources,  compressive  distilled  sensing 
can  result  in  significantly  improved  error  bounds  compared  to 
those  for  traditional  compressive  sensing. 

1.  Introduction 

Let  X  G  M"  be  a  sparse  vector  supported  on  the  set  S  = 
{i  :  Xi  f  0},  where  |5|  =  s  <C  n,  and  consider  observing  x 
according  to  the  linear  observation  model 

y  =  Ax  +  w,  (1) 

where  Gl  is  an  m  x  n  real-valued  matrix  (possibly  random) 
that  satisfies  E  [||xl|||,]  <  n,  and  where  Wi  ~  A/’(0,cr^)  for 
some  CT  >  0.  This  model  is  central  to  the  emerging  field  of 
compressive  sensing  (CS),  which  deals  primarily  with  recovery 
of  X  in  highly-underdetermined  settings  (that  is,  where  the 
number  of  measurements  m  n). 

Initial  results  in  CS  establish  a  rather  surprising  result — 
using  certain  observation  matrices  A  for  which  the  number  of 
rows  is  a  constant  multiple  of  s  log  n,  it  is  possible  to  recover 
X  exactly  from  {y,  A},  and  in  addition,  the  recovery  can  be 
accomplished  by  solving  a  tractable  convex  optimization  [1]- 
[3].  Matrices  A  for  which  this  exact  recovery  is  possible  are 
easy  to  construct  in  practice.  For  example,  matrices  whose 
entries  are  i.i.d.  realizations  of  certain  zero-mean  distributions 
(Gaussian,  symmetric  Bernoulli,  etc.)  are  sufficient  to  allow 
this  recovery  with  high  probability  [2]-[4]. 

In  practice,  however,  it  is  rarely  the  case  that  observations 
are  perfectly  noise-free.  In  these  settings,  rather  than  attempt 
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to  recover  x  exactly  the  goal  becomes  to  estimate  x  to  high 
accuracy  in  some  metric  (such  as  £2  norm)  [5],  [6].  One 
such  estimation  procedure  is  the  Dantzig  selector,  proposed 
in  [6],  which  establishes  that  CS  recovery  remains  stable  in 
the  presence  of  noise.  We  state  the  result  here  as  a  lemma. 

Lemma  1  (Dantzig  selector).  For  m  =  O(slogn),  generate 
a  random  mxn  matrix  A  whose  entries  are  i.i.d.  Mf),  lira), 
and  collect  observations  y  according  to  (1).  The  estimate 

X  =  arg  min  ||z||rj  subject  to  \\Af" {y  —  <  A, 

where  A  =  Q{a^/\ogn),  satisfies  ||al  —  xWj^  =  0{sa^  logn), 
with  probability  1  —  0(n~^°)  for  some  constant  Co  >  0. 

Remark  1.  The  constants  in  the  above  can  be  specified 
explicitly  (or  bounded  appropriately),  but  we  choose  to  present 
the  results  here  and  where  appropriate  in  the  sequel  in  terms 
of  scaling  relationships^  in  the  interest  of  simplicity. 

On  the  other  hand,  suppose  that  an  oracle  were  to  identify 
the  locations  of  the  nonzero  signal  components  (or  equiv¬ 
alently,  the  support  S)  prior  to  recovery.  Then  one  could 
construct  the  least-squares  estimate  x^s  =  A'fy, 

where  As  denotes  the  submatrix  of  A  formed  from  the 
columns  indexed  by  the  elements  of  S.  The  error  of  this  esti¬ 
mate  is  llxis  —  =  0{sa^)  with  probability  1  —  0(n~'^^) 

for  some  Ci  >  0,  as  shown  in  [6].  Comparing  this  oracle- 
assisted  bound  with  the  result  of  Lemma  1,  we  see  that  the 
primary  difference  is  the  presence  of  the  logarithmic  term  in 
the  error  bound  of  the  latter,  which  can  be  interpreted  as  the 
“searching  penalty”  associated  with  having  to  learn  the  correct 
signal  subspace. 

Of  course,  the  signal  subspace  will  rarely  (if  ever)  be 
known  a  priori.  But  suppose  that  it  were  possible  to  learn 
the  signal  subspace  from  the  data,  in  a  sequential,  adaptive 
fashion,  as  the  data  are  collected.  In  this  case,  sensing  energy 
could  be  focused  only  into  the  true  signal  subspace,  gradually 
improving  the  effective  measurement  SNR.  Intuitively,  one 
might  expect  that  this  type  of  procedure  could  ultimately 
yield  an  estimate  whose  accuracy  would  be  closer  to  that  of 

'Recall  that  for  functions  /  =  f{n)  and  g  =  g{n),  f  =  0(g)  means 
f  <  eg  for  some  constant  c  for  all  n  sufficiently  large,  /  =  ti{g)  means 
/  ^  e'g  for  a  constant  c'  for  all  n  sufficiently  large,  and  /  =  0(3)  means 
that  /  =  0(3)  and  f  =  n(3).  In  addition,  we  will  use  the  notation  /  =  0(3) 
to  indicate  that  lim„_,cxD  f/g  =  0. 
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the  oracle-assisted  estimator,  since  the  effective  observation 
matrix  would  begin  to  assume  the  structure  of  2I5.  Such 
adaptive  compressive  sampling  methods  have  been  proposed 
and  examined  empirically  [7]-[9],  but  to  date  the  performance 
benefits  of  adaptivity  in  compressive  sampling  have  not  been 
established  theoretically. 

In  this  paper  we  take  a  step  in  that  direction  by  ana¬ 
lyzing  the  performance  of  a  multi-step  adaptive  sampling- 
and-refinement  procedure  called  compressive  distilled  sensing 
(CDS),  extending  our  own  prior  work  in  distilled  sensing, 
where  the  theoretical  advantages  of  adaptive  sampling  in 
“uncompressed”  settings  were  quantified  [10],  [11].  Our  main 
results  here  guarantee  that,  for  signals  having  not  too  many 
nonzero  entries,  and  for  which  the  dynamic  range  is  not  too 
large,  a  total  of  C>(s  log  n)  adaptively-collected  measurements 
yield  an  estimator  that,  with  high  probability,  achieves  the 
O(scr^)  error  bound  of  the  oracle-assisted  estimator. 

The  remainder  of  the  paper  is  organized  as  follows.  The 
CDS  procedure  is  described  in  Sec.  II,  and  its  performance  is 
quantified  as  a  theorem  in  Sec.  III.  Extensions  and  conclusions 
are  briefly  described  in  Sec.  IV,  and  a  sketch  of  the  proof  of  the 
main  result  and  associated  lemmata  appear  in  the  Appendix. 

II.  Compressive  Distilled  Sensing 

In  this  section  we  describe  the  compressive  distilled  sensing 
(CDS)  procedure,  which  is  a  natural  generalization  of  the  dis¬ 
tilled  sensing  (DS)  procedure  [10],  [11].  The  CDS  procedure, 
given  in  Algorithm  1,  is  an  adaptive  procedure  comprised  of  an 
alternating  sequence  of  sampling  (or  observation)  steps  and  re¬ 
finement  (or  distillation)  steps,  and  for  which  the  observations 
are  subject  to  a  global  budget  of  sensing  resources  (or  “sensing 
energy”)  that  effectively  quantifies  the  average  measurement 
precision.  The  key  point  is  that  the  adaptive  nature  of  the 
procedure  allows  for  sensing  resources  to  be  allocated  non- 
uniformly;  in  particular,  proportionally  more  of  the  resources 
can  be  devoted  to  subspaces  of  interest  as  they  are  identified. 

In  the  jth  sampling  step  (for  j  =  l,...,fc),  we  collect 
measurements  only  at  locations  of  x  corresponding  to  indices 
in  a  set  (where  =  {!,...  ,  n}  initially).  The  jth 
refinement  step  (for  j  =  1, . . . ,  fc  —  1)  identifies  the  set  of 
locations  C  for  which  the  corresponding  signal 

components  are  to  be  measured  in  step  j  -f  1.  It  is  clear  that 
in  order  to  leverage  the  benefit  of  adaptivity,  the  distillation 
step  should  have  the  property  that  contains  most  (or 

ideally,  all)  of  the  indices  in  that  correspond  to  true  signal 
components.  In  addition,  and  perhaps  more  importantly,  we 
also  want  the  set  X(^+^)  to  be  significantly  smaller  than 
since  in  that  case  we  can  realize  an  SNR  improvement  from 
focusing  our  sensing  resources  into  the  appropriate  subspace. 

In  the  DS  procedure  examined  in  [10],  [11],  observations 
were  in  the  form  of  noisy  samples  of  x  at  any  location 
i  G  n}  at  each  step  j.  In  that  case  it  was  shown 

a  simple  refinement  operation — identifying  all  locations  for 
which  the  corresponding  observation  exceeded  a  threshold — 
was  sufficient  to  ensure  that  (with  high  probability) 
would  contain  most  of  the  indices  in  X^^)  corresponding  to 
true  signal  components,  but  only  about  half  of  the  remaining 


Algorithm  1:  Compressive  distilled  sensing  (CDS). 

Input: 

Number  of  observation  steps  fc; 

J  =  1,  •  •  ■  5  A,  such  that  Yl!j=i 
j  =  1,  ■  ■  ■  ,k,  such  that 

Initialize: 

Initial  index  set  X^^^  =  {1,2, . . .  ,n}; 

Distillation: 

for  J  =  1  to  fc  do 

Compute  T^i)  =  i?C)/|jO')|; 

Construct  where  Au)u  ~ 

AA(o,^),  trG{l,...,m(^)},  t;GX«  . 

0,  uG  {l,...,m(j)},  r;^X(^) 

Collect  =  A^^^x  -f 
Compute  x^^'i  = 

Refine  X^^'+i)  =  {i  G  X^^)  :  >  0}; 

end 

Output: 

Distilled  observations 


indices,  even  when  the  signal  is  very  weak.  On  the  other 
hand,  here  we  utilize  a  compressive  sensing  observation  model 
where  at  each  step  the  observations  are  in  the  form  of  a  low¬ 
dimensional  vector  y  G  ffi™,  with  to  <C  n.  In  an  attempt 
to  mimic  the  uncompressed  case,  here  we  propose  a  simi¬ 
lar  refinement  step  applied  to  the  “back-projection”  estimate 
y^^'i  =  x^^'i  G  M",  which  can  essentially  be  thought 
of  as  one  of  many  possible  estimates  or  reconstructions  of  x 
that  can  be  obtained  from  y^^^  and  A^^K  The  results  in  the 
next  section  quantify  the  improvements  that  can  be  achieved 
using  this  approach. 

III.  Main  Results 

To  state  our  main  results,  we  set  the  input  parameters  of 
Algorithm  1  as  follows.  Choose  a  G  (0,1/3),  let  &  =  (1  — 
a)/(l  —  2a),  and  let  fc  =  1  -f  [logf,logn].  Allocate  sensing 
resources  according  to 


and  note  that  this  allocation  guarantees  that  > 

1/2  and  Zj=i  <  n.  The  latter  inequality  ensures  that 
the  total  sensing  energy  does  not  exceed  the  total  sensing 
energy  used  in  conventional  CS.  The  number  of  measurements 
acquired  in  each  step  is 

™(i)  =  /  Pos\ogn/{k-l),  j  =  l,...,k-l  1 

\  pislogn,  j  =  k  j  ’ 

for  some  constants  po  (which  depends  on  the  dynamic  range) 
and  Pi  (sufficiently  large  so  that  the  results  of  Lemma  1  hold). 
Note  that  to  =  O(slogn),  the  same  order  as  the  minimum 
number  of  measurements  required  by  conventional  CS. 
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Our  main  result  of  the  paper,  stated  below  and  proved  in  the 
Appendix,  quantifies  the  error  performance  of  one  particular 
estimate  obtained  from  adaptive  observations  collected  using 
the  CDS  procedure. 

Theorem  1.  Assume  that  x  G  M"  is  sparse  with  s  = 
nPI  "  for  some  constant  0  <  /3  <  1.  Furthemore,  assume 

that  each  non-zero  component  of  x  satisfies  ap  <  Xi  <  Dcr/i, 
for  some  p  >  0.  Here  a  is  the  noise  standard  deviation, 
D  >  1  is  the  dynamic  range  of  the  signal,  and  p^  is  the  SNR. 
Adaptively  measure  x  according  to  Algorithm  1  with  the  input 
parameters  as  specified  above,  and  construct  the  estimator 
^CDS  by  applying  the  Dantzig  selector  with  A  =  0((t)  to  the 
output  of  the  algorithm  (i.e.,  with  A  =  and  y  = 

1)  There  exists  po  =  ft  ( i/log  n /  log  log  n)  such  that  if  p> 

po,  then  ||xcDS  ~  ~  O(sa^),  with  probability  1  — 

for  some  Cq  >  0. 

2)  There  exists  pi  =  0(-\/log  log  log  n)  such  that  if  pi  < 

p  <  po,  then  IjaicDS  ”3^11^2  ~  0{sa'^),  with  probability 
1  —  for  some  C[  >  0. 

3)  If  p  <  pi,  then  pcos  -  x\\%  =  0(scr^  log  log  log  n), 
with  probability  1  —  0{n~^^),  for  some  >  0. 

In  words,  when  the  SNR  is  sufficiently  large,  the  estimate 
achieves  the  error  performance  of  the  oracle-assisted  estimator, 
albeit  with  a  lower  (slightly  sub-polynomial)  convergence  rate. 
For  a  class  of  slightly  weaker  signals  the  oracle-assisted  error 
performance  is  still  achieved,  but  with  a  rate  of  convergence 
that  is  inversely  proportional  to  the  SNR.  Note  that  we  may 
summarize  the  results  of  the  theorem  with  the  general  claim 
||alcDS— 3^11^2  =  0(s(T^  logloglogn)  with  probability  1— o(l). 
It  is  worth  pointing  out  that  for  many  problems  of  practical 
interest  the  log  log  log  n  term  can  be  negligible,  whereas  log  n 
is  not;  for  example,  log  log  log (10®)  <  1,  but  log(10®)  «  14. 

IV.  Extensions  and  Conclusions 


the  dynamic  range  of  the  signal.  This  is  an  artifact  of  the 
method  for  obtaining  the  signal  estimate  x^^'>  at  each  step. 
As  alluded  at  the  end  of  Section  II,  x^^'i  could  be  obtained 
using  any  of  a  number  of  methods  including,  for  example, 
Dantzig  selector  estimation  (with  a  smaller  value  of  A)  or  other 
mixed-norm  reconstruction  techniques  such  as  LASSO  with 
sufficiently  small  regularization  parameters.  Such  extensions 
will  be  explored  in  future  work. 


V.  Appendix 

A.  Lemmata 

We  first  establish  several  key  lemmata  that  will  be  used  in 
the  sketch  of  the  proof  of  the  main  result.  In  particular,  the 
first  two  results  presented  below  quantify  the  effects  of  each 
refinement  step. 

Lemma  2.  Let  x  €  M"  be  supported  on  S  with  |N|  =  s, 
and  let  xs  denote  the  subvector  of  x  composed  of  entries  of  x 
whose  indices  are  in  S.  Let  Abe  an  mxn  matrix  whose  entries 
are  i.i.d.  Af{0,T/m)  for  some  0  <  Tmin  <  t,  and  let  Ag 
and  be  submatrices  of  A  composed  of  the  columns  of  A 
corresponding  to  the  indices  in  the  sets  S  and  S‘^,  respectively. 
Let  w  G  be  independent  of  A  and  have  i.i.d.  M{0,  a'^) 
entries.  For  the  z  x  1  vector  U  =  A^^Asxs  +  A^cW,  where 
z  =  |N°|  =  n  —  s,  we  have  (1/2  —  e)  z  <  J2j=i  l{Ui>o}  ^ 
(1/2  -f  e)  z  for  any  e  G  (0, 1/2)  with  probability  at  least  1  — 
2  exp(— 2e^2). 


Proof:  Define  Y  =  Ax  w  =  Agxs  +  w,  and  note  that 
given  Y,  the  entries  of  U  =  A^„Y  are  i.i.d.  Af{0,  ||  VH^t/to). 
Thus,  when  F  7^  0  we  have  Pr([/i  >  0)  =  1/2  for  all  i  = 
1, . . .  ,z.  Let  Ti  =  l{(7.>o}  apply  Hoeffding’s  inequality 
to  obtain  that  for  any  e  G  (0, 1/2), 


Pr 


>  ez 


Y-.YfO  <2exp{-2e^z). 


Although  the  CDS  procedure  was  specified  under  the  as¬ 
sumption  that  the  nonzero  signal  components  were  positive, 
it  can  be  easily  extended  to  signals  having  negative  entries 
as  well.  In  that  case,  one  could  split  the  budget  of  sensing 
resources  in  half,  executing  the  procedure  once  as  written 
and  again  replacing  the  refinement  step  by  =  \i  g 

I  A)  :  x^/^  <  0}.  In  addition,  the  results  presented  here 
also  apply  if  the  signal  is  sparse  another  basis.  To  implement 
the  procedure  in  that  case,  one  would  generate  the  as 
above,  but  observations  of  x  would  be  obtained  using  A^^'>T, 
where  T  G  is  an  appropriate  orthonormal  transformation 

matrix  (discrete  wavelet  or  cosine  transform,  for  example).  In 
either  case  the  qualitative  behavior  is  the  same — observations 
are  collected  by  projecting  x  onto  a  superposition  of  basis 
elements  from  the  appropriate  basis. 

We  have  shown  here  that  the  compressive  distilled  sensing 
procedure  can  significantly  improve  the  theoretical  perfor¬ 
mance  of  compressive  sensing.  In  experiments,  not  shown 
here  due  to  space  limitations,  we  have  found  that  CDS  can 
perform  significantly  better  than  CS  in  practice,  like  similar 
previously  proposed  adaptive  methods  [7]-[9].  We  remark 
that  our  theoretical  analysis  shows  that  CDS  is  sensitive  to 


Now,  we  integrate  to  obtain 


Pr 


>  ez 


i=l  / 

<  /  2exp  (—2e^z)  dPy  +  /  1  dPy 

Jy-.y^o  Jy:y=o 

<  2exp(— 2e^z). 

The  last  result  follows  from  the  fact  that  the  event  F  =  0  has 
probability  zero  since  F  is  Gaussian-distributed.  ■ 

Lemma  3.  Let  x,  S,  xs.  A,  As,  and  w  be  as  defined  in  the 
previous  lemma.  Assume  further  that  the  entries  of  x  satisfy 
ap  <  Xi  <  Dap  for  i  G  S  for  some  p  >  0  and  fixed  D  >  1. 
Define 


A  =  exp  — 


<  1, 


32(sD2-pm/r  ^/r^in), 

then  for  the  s  x  1  vector  V  =  A^Asxs  +  A^w,  either  of  the 
following  bounds  are  valid: 

Pr  (e  1{U.>0}  7^  <  2sA^, 
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Pr  <s(l-3A)  <4A. 


Proof:  Given  Ai  (the  ith  column  of  A)  we  have 


M 

( 

\mlx.,\\Ml 

1 

m 

) 

L  j^2  J 

/ 

and  so,  by  a  standard  Gaussian  tail  bound 


Pr{Vi  <  0  I  Ai)  =  Pr  A/'(0, 1)  > 


ll^i  ||r22'i 


<  exp  — 


2{t\\x\\^ /m  +  CT^) 


<  exp  — 


m7 


+  exp  - 


2{t\\x\\^  /m  +  (j2) 
'^(1  -  l)p^ 


For  the  second  result,  let  us  simplify  notation  by  introducing 
the  variables  Ti  =  =  E[ri].  By  Markov’s 

Inequality  we  have 


Pr 


2=1  2=1 


>p\  <  p  E’^^^E^* 

/  L  i^l  i^l 

s 

<  p-^Y.^[\Ti-U\] 


2  =  1 


Now  note  that 

1^2  -ti\  = 


<  p  max  E[\Ti  —  ti\]. 


1-P(14>0),  >0 

P{V,  >0),  <  0  ’ 


and  so  E[|Ti  — fi|]  <  2P{Vi  <  0).  Thus,  we  have  that 
maxig{i  E  [\Ti  —  fi|]  =  2A^,  and  so 


Now,  we  can  leverage  a  result  on  the  tails  of  a  chi-squared 
random  variable  from  [12]  to  obtain  that,  for  any  7  G  (0, 1), 
Pr  (IjAilp  <  (1  —  7)t)  <  exp  (—7717^/4).  Again  we  employ 
conditioning  to  obtain 

Pr(F,  <  0)  <  /  1  dPA, 

+  f  Pr(4^.  <  0  I  A,)  dPA, 

<  exp  exp  f- 


Pr 


\ii=i  i=i 

Now,  let  p  =  sA  to  obtain 


>  p]  <  4p  ^sA"^. 


Pr  E^*<E^*-®^  <4A. 


i=l  i=l 


Since  =  1  —  Pr  (Vi  <  0),  we  have  X]i=i  —  *(4  ~  2A^), 
and  thus 

Pr  T,  <  s(l  -  2A2  -  A)^  <  4A. 


2(tsD‘^P^ /m  +  1)/  The  result  follows  from  the  fact  that  2A^  +  A  <  3A. 


where  the  last  bound  follows  from  the  conditions  on  the  Xi. 
Now,  to  simplify,  we  choose  7  =  7*  G  (0, 1)  to  balance  the 
two  terms,  obtaining 


Using  the  fact  that 


-1 


^*  =  isD^  +  —  Jl  +  2{sD^  +  —  -1  . 


^/l  +  2t-  1  ^  1 

i  ^ 


for  f  >  1,  we  can  conclude 


1 


m 


7*  >  7  +  — 

^  '  Tp‘‘ 


-1/2 


since  s  >  1  by  assumption.  Now,  using  the  fact  that  t  >  t„ 
we  have  that  Pr(U  <  0)  <  2A^,  where 


A  =  exp  (  — 


32  (s£>2  -p  mp  2/rmi„) 

The  first  result  follows  from 


Pr  E  l{r/>o}  s  ]  =  Pr  U«<o) 


<  s  max  Pr  (Vi  <  0) 

<  2sA'^. 


Lemma  4.  For  0  <  p  <  1  and  q  >  Q,  we  have  (1  —  pY  > 
1  -  qp/(^-p)- 

Proof:  We  have  log(l— p)"^  =  qlog(l— p)  = 

— glog(l  +p/(l  —  p))  >  —qp/(\  —  p),  where  the  last  bound 
follows  from  the  fact  that  log(l  +  f)  <  f  for  f  >  0.  Thus, 
(1  —  pY  >  exp(—qp/(l  —  p))  >  1  —  9p/(l  —  p),  the  last 
bound  following  from  the  fact  e*  >  1  +  f  for  all  f  G  M.  ■ 

B.  Sketch  of  Proof  of  Theorem  1 

To  establish  the  main  results  of  the  paper,  we  will  first  show 
that  the  final  set  of  observations  of  the  CDS  procedure  is  (with 
high  probability)  equivalent  in  distribution  to  a  set  of  obser¬ 
vations  of  the  form  (1),  but  with  different  parameters  (smaller 
effective  dimension  Ueff  and  effective  noise  power  and 
for  which  some  fraction  of  the  original  signal  components  may 
be  absent.  To  that  end,  let  and 

for  j  =  1, ...  ,k,  denote  the  (sub)sets  of  indices  of  S  and  its 
complement,  respectively,  that  remain  to  be  measured  in  step  j. 
Note  that  at  each  step  of  the  procedure,  the  “back-projection” 
estimate  x  +  can  be  decom¬ 
posed  into  X50)  =  A^^fEsO)  +  (^so))  and 

subvectors  are  precisely  of  the  form  specified  in  the  conditions 
of  Lemmas  2  and  3. 
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Let  and  gO)  =  and  in  particular  note 

that  =  s  and  =  z  =  n  —  s.  Choose  the  parameters 
of  the  CDS  algorithm  as  specified  in  Section  III.  Iteratively 
applying  Lemma  2  we  have  that  for  any  fixed  e  G  (0, 1/2), 
the  bounds  (1/2  — <  z^^^  <  (l/2  +  e)^~^z  hold 
simultaneously  for  all  j  =  1,  2, . . . ,  A:  with  probability  at 
least  1  — 2(fc—  1)  exp  2ze^  (1/2  —  which  is  no  less 

than  1  —  0  (exp  (— con/  log'^^  n)),  for  some  constants  Cq  >  0 
and  Cl  >  0,  for  n  sufficiently  large^.  As  a  result,  with  the 
same  probability,  the  total  number  of  locations  in  the  set 
satisfies  |  <  (|  +  e)^  for  all  j  =  1,  2, . . . ,  fc. 

Thus,  we  can  lower  bound  at  each  step  by 

tO)  > 

an{{l-2a)/(l-a)y-^  ■  _  , 

s+z((1+2£)/2)J-i  ’  J 

_ gm _  n  —  k 

s+z((l+2e)/2)J-i  ’  J 

Now,  note  that  when  n  is  sufficiently  large^,  we  have  s  < 
z(l/2  +  e)^~^  holding  for  all  j  =  Letting  e  = 

(1— 3a)/(2— 2a),  we  can  simplify  the  bounds  on  to  obtain 
that  >  a/2  for  j  =  1, . . . ,  fc  —  1,  and  >  alog  (n)/2. 
The  salient  point  to  note  here  is  the  value  of  and  in 
particular,  its  dependence  on  the  signal  dimension  n.  This 
essentially  follows  from  the  fact  that  the  set  of  indices  to 
measure  decreases  by  a  fixed  factor  with  each  distillation 
step,  and  so  after  O  (log  log  n)  steps  the  number  of  indices 
to  measure  is  smaller  than  in  the  initial  step  by  a  factor 
of  about  logn.  Thus,  for  the  same  allocation  of  resources 
the  SNR  of  the  final  set  of  observations  is 
larger  than  that  of  the  first  set  by  a  factor  of  log  n. 

Now,  the  final  set  of  observations  is  +w^^\ 

where  a;(fc)  g  K"ett 

(for  some  rzefi  <  n)  is  supported  on  the 
set  is  an  x  UeS  matrix,  and  the 

Wi  are  i.i.d.  We  can  divide  throughout  by  to 

obtain  the  equivalent  statement  y  =  Ax  +  w,  where  now  the 
entries  of  A  are  i.i.d.  Af{0, 1/m)  and  the  Wi  are  i.i.d.  Af{0,  ct^), 
where  <  2(T^/(alogn).  To  bound  the  overall  squared 
error  we  consider  the  variance  associated  with  estimating  the 
components  of  x  using  the  Dantzig  selector  (cf.  Lemma  1), 
as  well  as  the  (squared)  bias  arising  from  the  fact  that  some 
signal  components  may  not  be  present  in  the  final  support  set 
In  particular,  a  bound  for  the  overall  error  is  given  by 

\\x-x\W  =  \\x-x  +  x-  x\W 

<  2\\x-m/^+2\\x-x\\j^. 

We  can  bound  the  first  term  by  applying  the  result  of  Lemma  1 
to  obtain  that  (for  pi  sufficiently  large)  \\x-x\W  =  0(scr2) 
holds  with  probability  1  —  for  some  Cq  >  0.  Now, 

let  5  =  (|5|  —  |5^^^|)/s  denote  the  fraction  of  true  signal 
components  that  are  rejected  by  the  CDS  procedure.  Then  we 
have  ||a:  —  =  0{sa‘^5p?),  and  so  overall,  we  have  ||a;  — 

a:||^2  =  0{sa^  +  sa'^dp?'),  with  probability  1  — 0(n“^®).  The 
method  for  bounding  the  second  term  in  the  error  bound  varies 

^In  particular,  we  require  n  >  CQ(log  log  log  n)(log  n)"^! /(I  — 
j^c2/ log  log  n-l^^  wfiere  Cq,  c(,  and  Cj  are  positive  functions  of  e  and  /3. 

^In  particular,  we  require  n  >  (1  +  log  n)*°s*°s  "Alog  log "-/3). 


depending  on  the  signal  amplitude  p;  we  consider  three  cases 
below. 

1)  i&D^yS/a)  i/log  n /  log  log  n:  Conditioned  on  the 
event  that  the  stated  lower-bounds  for  ^re  valid,  we  can 
iteratively  apply  Lemma  3,  taking  Tmin  =  ct/2.  For  po  = 
961?^/ log  &  (where  b  is  the  parameter  from  the  expression 
for  k),  let  =  pos log n/logj, logn.  Then  we  obtain  that 
for  all  n  sufficiently  large,  5  =  0  with  probability  at  least 
1  —  0(n“‘"o/'°®'°®”),  for  some  constant  Cq  >  0.  Since  this 
term  governs  the  rate,  we  have  overall  that  \\x-x\W  =  0(scr2) 
holds  with  probability  1  —  O(n“^o/ ^°s^°®”)  as  claimed. 

2)  ( 1 6  ■\/2 /  (a  log  5) )  v/log  log  log  n  <  p  < 

{8D i/S/a)  i/log  n /  log  log  n:  For  this  range  of  signal 
amplitude  we  will  need  to  control  6  explicitly.  Conditioned 
on  the  event  that  the  lower-bounds  for  hold,  we 

iteratively  apply  Lemma  3  where  for  po  =  961?^/ log  6, 
we  let  =  pos log n/logj logn.  Now,  we  invoke 

Lemma  4  to  obtain  that  for  n  sufficiently  large, 
6  =  1  —  (1  —  3A)^“^  =  C)(e“'"i^^)  with  probability 
at  least  1  —  for  some  C[  >  0.  It  follows  that 

is  0(1),  and  so  overall  ||x  —  xW/  =  O(scr^)  with  probability 
1  -  0(e-'^i''"). 

3)  p  <  ( 1 6  a/2/  (a  log  b) )  -y/log  log  log  n:  Invoking  the  triv¬ 

ial  bound  5  <  1,  it  follows  from  above  that  for  n  sufficiently 
large,  the  error  satisfies  ||a;  —  =  0(s(t^  log  log  logn), 

with  probability  1  —  0(n“'^2)  for  some  constant  O^,  >  0,  as 
claimed. 
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